feat(aqua): compress baked registry blobs with zstd#9131
Conversation
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Code Review
This pull request refactors the aqua-registry integration by compressing registry data with zstd and bundling it into the binary, which replaces the previous method of including numerous individual YAML files. It introduces metadata tracking for the baked-in registry, adds support for package aliases, and updates the build and release scripts to use a more efficient download-based workflow. Feedback suggests refactoring logic in build.rs and registry.rs to use more idiomatic Rust patterns and iterators.
| fn package_registries(packages: &[Value]) -> Result<Vec<PackageRegistry>> { | ||
| let mut registries = Vec::new(); | ||
| for package in packages { | ||
| let Some(id) = canonical_package_id(package) else { | ||
| continue; | ||
| }; | ||
| let content = package_registry_yaml(package)?; | ||
| let compressed_content = compress_registry_yaml(&content)?; | ||
| let aliases = package_aliases(package); | ||
| registries.push(PackageRegistry { | ||
| id, | ||
| compressed_content, | ||
| aliases, | ||
| }); | ||
| } | ||
| Ok(registries) | ||
| } |
There was a problem hiding this comment.
This function can be rewritten using iterators to be more idiomatic and concise.
fn package_registries(packages: &[Value]) -> Result<Vec<PackageRegistry>> {
packages
.iter()
.filter_map(|package| canonical_package_id(package).map(|id| (id, package)))
.map(|(id, package)| {
let content = package_registry_yaml(package)?;
let compressed_content = compress_registry_yaml(&content)?;
let aliases = package_aliases(package);
Ok(PackageRegistry {
id,
compressed_content,
aliases,
})
})
.collect()
}| fn baked_registry_file(package_id: &str) -> Option<&'static [u8]> { | ||
| if let Some(content) = AQUA_STANDARD_REGISTRY_FILES.get(package_id) { | ||
| return Some(*content); | ||
| } | ||
|
|
||
| AQUA_STANDARD_REGISTRY_ALIASES | ||
| .get(package_id) | ||
| .and_then(|canonical| AQUA_STANDARD_REGISTRY_FILES.get(*canonical)) | ||
| .copied() | ||
| } |
There was a problem hiding this comment.
This function can be simplified by using or_else to create a more idiomatic and concise expression.
| fn baked_registry_file(package_id: &str) -> Option<&'static [u8]> { | |
| if let Some(content) = AQUA_STANDARD_REGISTRY_FILES.get(package_id) { | |
| return Some(*content); | |
| } | |
| AQUA_STANDARD_REGISTRY_ALIASES | |
| .get(package_id) | |
| .and_then(|canonical| AQUA_STANDARD_REGISTRY_FILES.get(*canonical)) | |
| .copied() | |
| } | |
| fn baked_registry_file(package_id: &str) -> Option<&'static [u8]> { | |
| AQUA_STANDARD_REGISTRY_FILES | |
| .get(package_id) | |
| .copied() | |
| .or_else(|| { | |
| AQUA_STANDARD_REGISTRY_ALIASES | |
| .get(package_id) | |
| .and_then(|canonical| AQUA_STANDARD_REGISTRY_FILES.get(*canonical)) | |
| .copied() | |
| }) | |
| } |
Greptile SummaryThis PR compresses per-package aqua registry YAML blobs with zstd at build time (via the Confidence Score: 5/5Safe to merge — implementation is correct, well-tested, and only remaining finding is a P2 style issue. The compression/decompression logic is correct, types are properly updated throughout, the lazy decompression is gated by the existing AquaPackage cache, and tests cover canonical lookup, path-only lookup, alias resolution, and metadata. The only finding is a cosmetic grouping issue in Cargo.toml. crates/aqua-registry/Cargo.toml — minor: zstd placed under # Logging section Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["registry.yaml\n(~3MB)"] -->|build.rs| B["Parse packages\n(serde_yaml)"]
B --> C["Per-package YAML\n(String)"]
C -->|"zstd::encode_all\n(DEFAULT_COMPRESSION_LEVEL)"| D["Compressed blob\n(.zst, Vec<u8>)"]
D -->|"fs::write"| E["OUT_DIR/aqua_standard_registry_blobs/{idx}.zst"]
E -->|"include_bytes! (compile-time)"| F["Binary static bytes\n(&'static [u8])"]
F -->|"LazyLock HashMap"| G["AQUA_STANDARD_REGISTRY_FILES\nHashMap<&str, &[u8]>"]
H["fetch_registry(pkg_id)"] --> I{local .git?}
I -->|Yes| J["Read pkgs/.../registry.yaml\nfrom disk"]
I -->|No| K{use_baked_registry?}
K -->|Yes| L["baked_registry_file(pkg_id)\nor via alias map"]
L -->|"zstd::decode_all"| M["Decompressed YAML bytes"]
M -->|"serde_yaml::from_slice"| N["RegistryYaml → AquaPackage\n(cached in Mutex<HashMap>)"]
K -->|No| O["RegistryNotAvailable error"]
Reviews (1): Last reviewed commit: "feat(aqua): compress baked registry blob..." | Re-trigger Greptile |
| zstd = "0.13" | ||
|
|
There was a problem hiding this comment.
zstd grouped under # Logging comment
zstd is placed directly after log = "0.4" under the # Logging section header, which will confuse future readers looking for compression-related dependencies.
| zstd = "0.13" | |
| # Logging | |
| log = "0.4" | |
| # Compression | |
| zstd = "0.13" |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
|
This compresses the size of binary 1,867,728 bytes; considering the original registry file is 3,003,656 bytes, this is good. However I think we don't need this as the binary is already 80MB and zstd compression costs in runtime. |
Summary
include_bytes!and keep alias/canonical lookup behavior unchangedBinary size (serious profile, all features)
83,100,752bytes81,233,024bytes-1,867,728bytes (-2.25%)Notes
codex/aqua-baked-registry-yamland is expected to merge after refactor(aqua): bake aqua registry from merged yaml #9043